On the Performance of Latent Semantic Indexing based Information Retrieval

نویسندگان

  • Cherukuri Aswani Kumar
  • Srinivas Suripeddi
چکیده

Conventional vector-based Information Retrieval (IR) models: Vector Space Model (VSM) and Generalized Vector Space Model (GVSM) represents documents and queries as vectors in a multidimensional space. This high dimensional data places great demands on computing resources. To overcome these problems, Latent Semantic Indexing (LSI), a variant of VSM, projects the documents into a lower dimensional space. It is stated in IR literature that LSI model is 30% more effective than classical VSM models. However, statistical significance tests are required to evaluate the reliability of such comparisons. Focus of this paper is to address this issue. We discuss the tradeoffs of VSM, GVSM, LSI and evaluate the difference in performance on four testing document collections. Then we analyze the statistical significance of these performance differences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification and clustering methods for documents by probabilistic latent semantic indexing model

Based on information retrieval model especially probabilistic latent semantic indexing (PLSI) model, we discuss methods for classification and clustering of a set of documents. A method for classification is presented and is demonstrated its good performance by applying to a set of benchmark documents with free format (text only). Then the classification method is modified to a clustering metho...

متن کامل

A Similarity-based Approach to Relevance Learning

In several information retrieval (IR) systems there is a possibility for user feedback. Many machine learning methods have been proposed that learn from the feedback information in a longterm fashion. In this paper, we present an approach that builds on user feedback across multiple queries in order to improve the retrieval quality of novel queries. This allows users of an IR system to retrieve...

متن کامل

Clustered SVD strategies in latent semantic indexing q

The text retrieval method using latent semantic indexing (LSI) technique with truncated singular value decomposition (SVD) has been intensively studied in recent years. The SVD reduces the noise contained in the original representation of the term–document matrix and improves the information retrieval accuracy. Recent studies indicate that SVD is mostly useful for small homogeneous data collect...

متن کامل

Probabilistic Latent Semantic Indexing Proceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval

Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domain{speci c synonymy as well as with polysemous words. In contrast ...

متن کامل

Clustered SVD strategies in latent semantic indexing

The text retrieval method using Latent Semantic Indexing (LSI) technique with truncated Singular Value Decomposition (SVD) has been intensively studied in recent years. The SVD reduces the noise contained in the original representation of the term-document matrix and improves the information retrieval accuracy. Recent studies indicate that SVD is mostly useful for small homogeneous data collect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CIT

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2009